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^ Abstract 

ly-N Given a chain complex with the only modification that each cell of the 

complex has a probability distribution assigned. We will call this complex 
r^ - a random complex and what should be understood in practice, is that we 

^ have a classical chain complex whose cells appear and disappear according 

to some probability distributions. In this paper, we will try to find the 
stochastic homology of random complex, whose simplices have independent 
discrete distributions. 
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The development of the computer technologies nowadays is so vast that ba- 
sically almost everyone in the world relies on some sort of computational device. 
Devices such as personal computers, cell phones, global positioning systems, etc 
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are part of our life, mostly because of their ability to perform millions of com- 
plex mathematical operations within a second. Even so, the human brain is still 
unbeatable with its property to successfully approximate solutions of complex 
problems. Without any equivocations, here is what I mean: 

Everybody has seen a satellite map on Google Earth like the one of the Uptown 
campus of Tulane University below. 




Tulane's Uptown campus 

One very significant feature of most of the university campuses is the net of 
paths between the buildings like the one on Tulane's campus. Here is another 
picture - this time from the area where I currently live. 




Uptown New Orleans 

Definitely, if one sees two pictures - one sliowing tlie net of paths on Tulane's 
campus and one whicli was taken anywliere in tlie residential area, he can distin- 
guish which one is from the campus and which one is not by simply observing 
the existence of the gray net of paths on one of them. Notice that we do not 
restrict ourselves to the angle which captures the tracks on the campus, nor the 
scale. This uncountable set of pictures showing gray paths and green grass can be 
identified by a simple topological invariant namely 

^Green ^ 22, 6f^^^" = 



Ipray 



1, 6f™^ = 22. 



Here we suppose that the number of cycles which the paths create are 22, all 
paths enclose grass regions and are connected. So any picture which has more 
than a few gray cycles, can be considered from the Tulane's campus. Of coarse, 
this might not be true at all in general, but if the maps which are covered are 
small enough and the pictures have enough details, there are certainly uniqueness 
conditions which imply that a picture belongs to a certain region. It all seems so 
easy - get two satellite pictures from the regions you want to recognize, threshold 
them by subdividing the RGB cube into smaller cubes, get a representation of each 
region in terms of betti numbers corresponding to certain colors using techniques 
like persistence homology, do the same for the picture which was taken from the 
ground and hope that you are lucky enough to get unique sets of betti numbers 
in your picture, so that you can guess which region it corresponds to. However, 
once you get your hands dirty with the real data, it turns out that it is not THAT 
easy. In general, each picture contains so much noise that it is very difficult to get 
the actual betti numbers. Trees, cars, people, and shadows are just the starting 
point of all troubles which one encounters. Take a look at the next picture 




Close capture of Tulane's Uptown campus 

and try to count the gray cycles. You should be able to count at least 10 of them, 
and all of them are approximations made by your brain. On the other side, trying 
to compute the number of cycles by subdivision of the RGB cube gives terrible 
results no matter what subdivision of the RGB cube you take. Thus we need 
a better tool to capture the topological data in the picture. Something which 
would allow us to guess whether a darker region really exists, or belongs to a near 
lighter region, or is simply a person, etc. We need methods which give freedom 
to eliminate noise, no matter how big the source is. This brings us to the idea of 
stochastic homology. 

Loosely speaking, stochastic homology is an extension of the idea of homology. 
We build our theory like it is done in the classical homology theory - by looking 
at cell complexes and boundary maps. This time, however, the fundamental unit 
is called random complex, which is a set of random cells. Each random cell differs 
from the classical cell by the property that it has a probability distribution as- 
signed to it. In this paper we would only consider random complexes with discrete 
probabilities. You can think of those probabilities as the probabilities of existence 
of each cell. Probability of 1/2 of a certain cell means that the cell exists only 
during half of the time. We would also suppose that all cells are independently 
distributed as dependence between the cells is the same as looking at the cells as 
one, and thus is not very interesting to consider. Also, we would only consider 
maximal complexes, as any complex sits in a maximal. 

1 The First Example 



Suppose we are given a very simple chain complex of two points - 1 and 2, and an 
edge joining them. As noted before, the random complex has assigned probabilities 



to its cells and let us denote the probabilities of the vertices 1 and 2 with pi and 
P2 correspondingly, and the probability of the edge with pi2. Let's assume for 
completeness that pi = 1/2, p2 = 1/4 and pi2 = 1/3. Thus point 1 appears only 
during half of the time, point 2 appears in 1/4 of the time and the edge between 
them, appears in 1/3 of the time when possible. This means that the random 
complex consists of two points connected with an edge in l/24( = 1/2 * 1/3 * 
1/4) of the time, there are certain moments when the complex is represented by 
the two points only and that happens in 1/12 ( = 1/2 * (1 - 1/3) * 1/4) of the 
time, in 3/8 ( = 1/2 * (1 - 1/4)) we can see the existence of point 1, in 1/8 ( = 
(1 - 1/2) * 1/4) time we can observe point 2 only and of course, in the remaining 
time of 3/8 ( = (1 - 1/2) * (1 - 1/4)) non of the cells exist and the complex is 
represented by the empty set. 



3/8 


3/8 
l' 


1/8 .2 


1/12 *2 

r 


i..-24 .^ 
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Subcomplexes of the random complex 

We already split the random complex into subcomplexes in the classical sense and 
to each subcomplex we can assign probability or time of existence. Each subcom- 
plex has certain classical topological invariants assigned like euler characteristic, 
homology, cohomology, homotopy groups, etc. Informally, we can define the class 
of expected invariants as 



expected invariant 



Z^ 



value of the classical invariant{A) * probability{A) 



where A runs over all possible chain subcomplexes. Using the last formula, it is 
very easy to compute the expected number of components, denote by 6f , of the 
above complex and it is given by 



6^ 



0*3/8 + 1*3/8 + 1*1/8 + 2* 1/12 + 1 * 1/24 = 17/24. 



Of course, in this simple case the expected number of components is equal to the 
expected Euler characteristic, denoted as x^^ so x^ = 17/24 also. 
Although already mentioned, let us give the following 

Definition 1. The expected k-th betti number b^ is the expectation of the classical 
k-th betti number over all possible configurations of complexes, i.e. 



b,{A)p{A) 
where A runs over all possible chain complexes. 



A 



Consider the previous example and let us try to find a formula for Bq in terms 
of the probabilities pi,P2 and pi2- Using the above arguments, we have 






* (1 - pi) * (1 - P2) + 1 * Pi * (1 - P2) + 1 * (1 - Pi) * P2 

+2*Pi*P2*{l - P12) + l*Pl*P2*Pl2= Pl+P2-Pl*P2* Pl2- 



Note that the probability pu appears only when both points exist in the complex. 
Also, for the sake of short notations we would only write pi2 for the term pi*p2*Pi2 
or it should be understood that the probabilities of the lower dimensional cells are 
already implemented into the probability of the higher dimensional cell. There is 
a nice geometric representation for such formulas and we would prefer to utilize 
it whenever possible. The coefficient of each summand appears in the upper left 
corner. 



+ + .2 . .2 



Geometric representation of b^ of a maximal random complex over two points 

Actually, if someone simplifies the expected zeroth betti number of the maximal 
random complex over 3 points, it turns out that 

bo =Pi+P2+P3- P12 - Pi3 - P23 + P12 * Pi3 * P23 



+ .2 + . 2 . .2+2 
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Geometric representation of b^ of a maximal random complex over three points 

and it seems that we can already guess what the general formula for the expected 
zeroth betti number of the maximal random complex over n points could be. Well, 
not quite! In fact, the formula for the maximal 4 points complex is given 














Geometric representation of bn of a maximal random complex over four points 



and the one for 5 points is given by 











































































































Geometric representation of b^ of a maximal random complex over four points 

The formula for the maximal random complex over six points include 12987 
terms. However, the total number of subcomplexes of the maximal complex over 

n points is given by 

fc=0 ^ ^ 

The following table represents the last number 

Points : 1 Complexes : 2 

2 5 

3 18 

4 113 

5 1450 

6 40069 

7 2350602 

8 286192513 

9 2494306930 

Among this mess of summands, we will try to find a recurrence. 



2 Term Structure of 6^ 

The first natural question which we may ask is about the expected zeroth betti 
number of n points and no edges in the random complex. The answer is given by 
the next 

Claim 1. Let the random complex X consists of n points only. Then zeroth betti 
of X is 

b^{X)=pi + ...+pn 

Proof. If we sum over the number of points, i.e. over the number of connected 
components, we have the following identity 

EILl Pi(l - Pi) • • • (1 - Pih^A.^ - Vn) 

+2 Ei<j PiPi(l - Pi) ... (1 - Pi) ... (1 - Vj) . . . (1 - P„) (1) 

+ ... 

+npi ...pn = pi + ...+Pn 



where the coefficient in front of each sum is the betti number of the corresponding 
complexes. The proof of equation ([I| is done by induction. Obviously, for n = 1 
the formula is true. Suppose that equation ([I]) is true for n and consider the 
following sum 

+2 Ei<j PiPi(l - Pi) • • • (1 - Pi)_^(l - Pj)_^(l - Pn+l)^ 

+3 Y.i<j<kViVjVk{l - Pi) ... (1 - p.) ••• (1 - Vj) ... (1 - Pfc) ... (1 - p„+i) 

+ ... 

+^ Er=/ (1 -Pi)pi...Pi... Pn+l 

+ {n + l)pi...pn+i 

= (1 - Pn+l) Er=l Pi(l - Pi) • • • (1 ^J^- • (1 ^i^+ P"+l(l - Pi) • • • (1 - Pn) 

+2(1 - Pn+l) Ei<j PiPi(l "^^ii^- (1 - Pi) • • • (1 - Pi) • • • (1 - Pn) 

+2Pn+l E»=l P*(l - Pi) • • • (1 - Pi) • • • (Ij^^Pn) 

+3(1 - Pn+l) Y.i<j<kPiP3Pk{l ^J}1 ■ ■ ■ (l^i^ ... (1 - Pj) ... (1 - Pfc) ... (1 - Pn+l) 

+3pn+l Ei<i PiPi(l - Pi) • • • (1 - Pi) • • • (1 - Pj) ... (1 - Pn) 

+ ... 

+ (1 - Pn+l)npi ...Pn + nPn+1 J2'^=ii'^ " Pi)Pl ■ ■ ■ Pi ■ ■ ■ Pn+l 

+ {n + l)pi...pn+i 

= (1 -Pn+l)(Pl + .. ■+Pn) +Pn+l(Pl + . . . + Pn) +P„+l(l - Pl) . . . (1 - p„) 

+Pn+1 Er=l Pi(l - Pi) • • • (1 ^3)^ • • (1 ZJ^ 

+Pn+1 T.KjPiPA'^ - pi) ... (1 - Pi) ... (1 - Pj) . . . (1 - p„) 

+ ... 

+Pn+1 Er=l(l - Pi)Pl ---Pi--- Pn+l 

+Pl . . . Pn+l 

= (Pl + ... +Pn) +Pn+i(l -^...(1-pn) 

+Pn+1 Er=l Pi(l - Pi) • • • (1 ZJ^ ■ ■ *^^ ~J^ 

+Pn+1 Ei<iPiPi(l - Pi) • • • (1 - P») • • • (1 - Pj) . . . (1 - Pn) 

+ ... 

+Pn+1 Er=l(l - Pi)Pl ■■■Pi--- Pn+l 

+Pn+lPl . . . Pn 

In the last equation we split the sum into two sums and use the induction 
hypothesis. The Claim follows by the next result. 

D 

Claim 2. The following equation holds 

(1 -Pl) . . . (1 -Pn) + EJLlPi(l -P3l^(l^) • • • (1 -Pn) 

+ Ei<jPiPj(l-Pl)---(l-pO---(l-Pi)---(l-Pn) + ...+Pl...Pn = 1 
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Proof. Using induction again, we have that the result holds forn = 1. Suppose 
it is true for n and split the sum as before 

(1-Pl)...(l -Pn+l) + YJ^Pi{^^Plh--{'^ - Pi) . . . {I - Pn+l) 

+ Y.i<jPiPj{^ - Pi) ... (1 - Pi) ... (1 - Pj) ... (1 - Pn+l) + . . . + Pl^^^+l 

= (1 - p„+l)(l - Pl) ... (1 - Pn) + (1 - Pn+l) Er=lPi(l - Pl)^^_(J - Pi)^^_(l - Pn) 

+Pn+l(l -Pl)...(l -p„) + (l-Pn+l)Ei<jP*Pi(l-Pl)---(l -Pi)...(l -Pj)...(l-P„) + 

+Pl . . . Pn+l = (1 - Pn+l) + Pn+l = 1 

D 

Actually, this long proof could be omitted if we notice that since there are 
no edges in the randoms complex consisting of n points, then these points are 
independent variables and thus 



n 



i=l i=l 

which is just another way of writing Claim [TJ 

Next, we concentrate on a more general random complex X over n points 
which has cells in higher dimensions. The first observation is that h^ depends 
only on the zeroth and first chain subcomplexes of X as the classical betti number 
depends only on those subcomplexes. Then while we sum all various monomials 
corresponding to the subcomplexes of X, we can note that basically there are two 
types of monomials - one corresponding to complexes in which there exist no edges 

Pn...Pip(l-PiiJ2)---(l-Pi,>) 
and another in which there is at least one edge 

Pil • • • Pip(l - Phh) ... (1 - Pi,>)Pfclfe2 • • • Pfc.fef 

If we consider all summands from the first type and expand all the products in 
the parentheses, there will be summands of type 

Pii-- -Pip- 
As we know from Claim [TJ the sum of all these summands is pi + . . . + p„, thus 
we can split the expected zeroth betti of X as 

6^(X) = pi + . . . +p„ + (terms having a variable corresponding to an edge). 

Now we put our efforts on deciphering the second part of the polynomial 6^(X). 
We try to figure out the coefficient in front of a general monomial containing an 
edge in the next 
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Lemma 1. Let m be the minimum am,ount of points which are covered by the 
edges Pi-^^i^, ■ ■ ■ ,Pi2n-ii2n ^'^'^ Pi, ■ ■ ■ ,Pm o^re the probabilities of the covered vertices. 
Then the coefficient in front of pi . . . PmPm+i ■ ■ ■ PkPhi2 ■ ■ ■ Pi2n-ii2n forn>0 m the 
reduced polynomial b^ is given by 

k—m / ] \ " 

E~ ""(-!)' E i-ink-i-n+\u\+bt) (2) 

i=0 ^ ^ |a;|=0 

LUeC{h,...,i2n) 

Here u is a combinatorial term which runs over the edges pi^^i^ . . . Pi2n-ii2n ^^^ 
builds different subcomplexes X^ . \uj\ is the number of non-existing edges and b^ 
is the first betti number corresponding to the complex X'^ . 

Proof. Since the number of vertices m is the minimum number of points which 
are covered by the edges then pi . . .p^ should always exist in all subcomplexes as 
if one of them fails to exist, then there will be no corresponding edge term in the 
monomial. In this sense, pi . . .pm are fixed to exist. While, the terms Pm+i ■ ■ -Pu 
have no infiuence on the edges, we can freely choose the points ?n + 1, . . . , fc to exist 
or not. Same is true for the edges corresponding to pi^i^ . . .pi2„_iJ2n- There might 
be more edge terms, but we must choose them to fail to exist as otherwise they will 
contribute an unnecessary term. Thus we need to sum all possible subcomplexes 
in which the points corresponding to pi .. .pm exist, the points corresponding to 
Pm+\ ■ ■ -Pk and the edges corresponding to pi^jj . . .pi2„_iJ2n '^^Y o^ ^^Y ^^^ exist, 
and all other possible edges must not exist. The term ( "*") in the above formula 
is due to the number of choices for the free points. (—1)* gives the sign after 
expanding, where i is the number of non existing free points. {k — i — n+\uj\+ b^) 
gives the zeroth betti number of a fixed subcomplex as we have k — i total points, 
subtract the number of edges n+|a;| and add the first betti number (Euler-Poincare 
formula). (— l)'"' gives the sign of non-existing edges and we have to sum over all 
possible existing or non-existing edges and so we get the coefficient ^. 

D 

Our next goal is to simplify the coefficient ^. First break the second sum 
into two sums 

ElT('T)(-l)^fE" M=o i-ltKk'^) + E'' M=o (-l)H(~n+|a;|+6r: 

Y UJ&C{tl,...,l2n) tjeC(ll,...,«2n) 

The first summand does not depend combinatorially on u, so we can replace the 
combinatorial term by a binomial coefficient 

ElT ('7'") (-1)^ (ik - Em=o(-1)'^' (m) + E" M=o i-iy-K-n + |c.| + b-,\ 
= Eto™ tm-^y ( (^ - 0(1 - 1)" + E" H=o hlt\-n + \u\ + b-) 

\ UjeC{ll,...,t2n) 

= ElT('r)(-i)^E" M=o {-itK-n + \uj\ + b^) 

ajeC{ti,...,i2„) 
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The last equality follows by the fact that n is nonzero. The sums split variables, 
so we can evaluate the first one separately 

= (1-1)'^-™E" M=o (-l)l-l(-n + |a;| + 6r) 

ajGC(ii,...,J2n) 

E" M=o (-l)l'^l(-n + |c^| + 6r) iffc = m (3) 

U}eC{il,...,i2n) 

ii k ^ m 

If 6'J^ = for all u, the coefficient is equal to 

Er.No(-i)'^'(--+i^i)(M) = Er.No(-i)'^'^^-(V) 

= (-l)"n(l- 1)"-1 

-1 ifn = l 
iiriy^l 

Thus we proved the following 

Claim 3. All edges come with coefficient -1. 

Finally, let us examine the case when 6i 7^ in equation (pi). We can split the 
last sum ones again and substitute the combinatorial term in the second one 

WGC(ll,...,«2n) l^<^C(ll,...,l2n) 

= E" M=o (-l)l^l&r-EH=o(-l)'^'(-^+l^l)(M) 
weC(ii,...,J27i) 

Now the second sum is since one edge can never build a cycle, thus we have the 
following 

Theorem 1. Suppose that hi of the corresponding to pi . . . PmPiii2 ■ ■ ■ Pi2n-ii2n com- 
plex is nonzero. Then the coefficient in front of pi . . . PmPiii2 ■ ■ ■ Pi2n-ii2n ^'^ ^^^ 
reduced polynomial &f (X) is given by 



E i-^Pbt (4) 



|<.^|=o 

weC(ii,...,i2n) 

Example. Set all Pi and Pij to be either or 1 and plug into the formula for 
b^. Then you get another decomposition of the classical 60 • 

Definition 2. We call a simple 1-cycle a complex in which each vertex is covered 
by exactly two edges. In general, simple n-cycle is a complex in which each (n-1)- 
cell is covered by exactly two n-cells. 

Consequence 1. The coefficient of the monomial corresponding to a simple 1- 
cycle is always 1. 
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Proof. The only complex with nonzero first betti number is the complex 
consisting of the simple cycle. 

D 

Definition 3. We call a 1-spike the connected union of edges which do not con- 
stitute a 1-cycle in a complex. Similarly, n-spike is the connected union of n-cells 
which do not constitute a n- cycle. 

Suppose now we are given a simple 1-cycle of any length with k 1-spikes either 
attached to it or not, k > 0. The only way to decompose the complex so that each 
new complex has nonzero first betti number is to combinatorially remove all the 
spikes. Using Theorem[T]we get that the coefficient of the monomial corresponding 
to the initial complex is zero since k is nonzero and 

^(-i).Q) = (i-i)' = o. 

Definition 4. Two complexes are called to be n-non-intersecting if they do not 
intersect in a n-cell but they may intersect in cells of lower than n dimension. An 
intersection of two complexes is said to be n-nonempty if the complexes intersect 
in a n-cell. 

It turns out that there are no terms in the b^ polynomial which correspond to 
two 1-non-intersecting cycles of lengths rii > and n2 > since in this case the 
coefficient is given by 

(2+Er=i(-i)'(7) +Er:i(-i)^(?)) = (Er=o(-i)^(7) +Er4(-i)^(?)) 

= ((1-1)"1 + (1-1)"2) =0 

Suppose we are given a complex consisting of two cycles of lengths ni > and 
n2 > with 1-nonempty intersection which occurs in k > edges and no spikes. 
Then the coefficient of the corresponding monomial is -1 since 

(2 + e:^i(-i)^(t) + EZi{-^y{7) + Eli(-i)H-)) 

= (Er=o(-i)^(7) + nioi-^rci) + Elo(-i)^© - 1) 
= ((1 - i)"i + (1 - 1)"2 + (1 - 1)^ - 1) = -1- 

Example. So far we already discovered most of the formula for the expected 
zeroth betti number of the tetrahedron. The only unknown which remains is the 
coefficient of the maximal complex which we denote with C. 

6f(X) = P1+P2+P3+P4. 

-Pl2 - Pl3 - PU - P23 - P24 - P34 
+P12P13P23 + P12P14P24. + P13PUP34 + P23P24P34 

+P12P23P34P14 + P12P24P34P13 + P13P23P24PU (5) 

-P12P13P14P23P24 - P12P13P14P23P34 - P12P13P14P24P34 
-P12P13P23P24P34 - P12P14P23P24P34 - P13P14P23P24P34 
+Cpi2Pl3Pl4P23P24P34 
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One way to get the coefficient C is by using formula Q and combinatorially 
decomposing the corresponding complex. We leave this as an exercise. Another 
more efficient way to calculate the coefficient C is to set all probabilities of the 
points and edges to be equal to one and substitute them in equation (|5]). Thus 
6f (X) in this case is equal to bQ^X) = 1 and we can solve the equation 

1 =4_6+4+3-6+C 

for C and get C = 2. 

As one can notice, the computing coefficients is via formula Q is slow and 
resource demanding. We will be looking for a more efficient formula for computing 
the coefficients of the monomials in the polynomial b^. For simplicity, we introduce 
the following notations. 

The initial complex will usually be denoted by A and the set of all complexes 
which are derived from A by removing exactly i edges will be denoted by A*. 
Notice that A° is just A. The notation 6i(A*) will stand for the sum of the betti 
numbers of all complexes which belong to A* which we can write as 

<^&C{ih^,...,ih^) 
\ui\=i 

Similarly, Ci(A) will mean the coefficient of A given by formula (H) and Ci(A') 
will denote the sum of the coefficients of the elements of A\ Here the subindex 
in ci is used to emphasize that the coefficient corresponds to the 1-cell complex. 
Considering the results above, we artificially define ci(A) to be zero if A has 
6i = 0. 

Suppose the random complex A has n edges. Using the notations we just 
introduced, we can rewrite formula (H) as 

n 

c,{A) = j2i-iyhm- (6) 

Our next goal is to keep 6i(A°) and write &i(A^) in terms of Ci(A^). 

The careful reader can notice that after removing an edge from the terms of 
A^, then the set of all new complexes covers A^ twice. In case we remove two 
edges from the complexes of A^, then we can notice that A'^ is covered three times 
and we can conjecture that for any i, A* is covered i times. Notice that A^ has 
(") elements each having n — 1 edges and A* has (") elements each with n — i 
edges. There are ("" ) choices to remove an edge from a fixed element of A^ and 
there are n such elements, so A^ is covered twice. Next, we have 2 * Q) elements 
and we remove one more edge and it is easy to see that A^ is covered three times. 
The statement is proved by induction and will be left as an exercise. 

Using the last observation, we can rewrite equation (|6| as 

ci(A) = 6i(A0)-6i(Ai) + ... + (-l)"n6i(A") + (-6i(A2) + 26i(A3)-... 
+ {-l)^-\n - l)bi{A^)) 
= 6i(A0) - ci(Ai) - 6i(A2) + 26i(A3) - . . . + (-l)"-i(r2 - l)6i(A"). 
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The strategy remains the same - change the next summand 61 (A^) to ci(A^). 
This time we need to calculate how many times A^ covers A^. So in order to save 
time and efforts, let us do it in general. We need to figure out how many times 
the elements from A-' cover the elements in A*, for j < i. The number of edges of 
each element of A-' is n — j and we need to remove i — j 1-cells in order to drop 
to level A*. There are (") elements in A-', each covering {^Z'') elements from A*, 
i.e. (") ("J'') compared to (") gives that A-' covers A* exactly (*) times. 

Thus, the coefficient of 61 (A™), after changing the first m — 1 betti terms to 
the corresponding coefficients Ci, is always 

(-)•" (^ - (T) - (:) -(:)-•- (-)"- {z 0) - '-'•"(-'"■^' - -■ 

We just proved the next 

Theorem 2. Using the above notations and definitions, the following formula 
holds 

n 

j2cim = h{A). (7) 

It is a good moment to verify this formula by an 

Example. ci{Pyramid) = 3 - 6(-l) - 3(1) - 4(1) = 2. 

So far, we know that any 1-cycle with k 1-spikes have a coefficient equal to 
0. We also know that the coefficient of a complex consisting of any two 1-non- 
intersecting cycles equals too. Our next goal is to show that the coefficient of 
the complex consisting of two simple cycles either 1-intersecting or not and a spike 
is 0. Denote the complex with A and the two simple cycles with Ai and A2. Let 
A' be the subcomplex derived form A by erasing the spike and letA^ be the set of 
complexes which is complement to A', i.e. such that A^ = A' U A^. Split formula 
([7]) as two sets of elements - one generated by all the subcomplexes obtained from 
A' and the second one - from A^. If A consists of two 1-intersecting simple cycles 
and a spike, then Ci(A) = follows from formula ^ applied to Ci(A'). In case 
that we are dealing with two 1-non-intersecting simple cycles and a spike, then we 
can write formula ([T]) as 

ci(A) = 2 - ci(AO - ci(Ai) - ci(A2) - ... - ci(Ai) - ci(A2) 

where all the coefficients except ci(Ai) and ci(A2) are 0, so once again we have 

ci(A) = 0. 

Finally, we can extend the result for a complex of two simple cycles and k 
spikes by induction if we suppose that the coefficient of two cycles with fc — 1 
spikes is zero, using exactly the same arguments as above, we get that Ci(A) = 0. 
To summarize everything known so far, we discovered that the coefficient of any 
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complex of either 1-intersecting or not two simple cycles and k > oi spikes is 
always 0. By the same argument, we can take arbitrary complex with non-zero 
coefficient add a spike to it, use formula ([T]), and get that the coefficient of the 
complex is 0. Then, using induction, one can show that Ci of the union of the 
non-zero coefficient complex and k spikes is again 0. 

Our final aim is to expand the result for any number of simple cycles and 
spikes satisfying the previous restrictions, i.e. we will show that the coefficient 
of a complex consisting of any number of simple cycles, either 1-intersecting or 
not and any positive number of spikes is always 0, and also - the coefficient of a 
complex which is build up from two 1-non-intersecting complexes is also 0. Then 
one can use the same argument which we use and extend the result from simple 
cycles to any complexes. Once again, we get use of the simple but powerful method 
of induction, this time the induction will go on the number of component. Suppose 
that the coefficients of any number of 1-non-intersecting cycles up to p > 1 is zero 
and the coefficients of any number of 1-non-intersecting cycles up to some number 
Bq with any number of spikes is also zero. Then the proof is done in two steps. 
First, take A to be p + 1 1-non-intersecting fundamental cycles. Using formula ([T]) 
we have that 

p+i 
ci(A) = p + 1 - ci(Ai) - ci(A2) "... - 5^ ci(Ai). 

By the induction hypothesis, all Ci(A*) = 0, thus Ci(A) = 0. Similarly as before, 
using induction again, we can show that p + 1 cycles with any number of spikes 
have coefficient equal to 0. 

Thus we revealed everything about the structure of the polynomial which mea- 
sures Bq. 

Theorem 3. After reduction, the polynomial Bq has the structure 
bo = Pi + ■ ■ ■ + Pn ~ y^ Pij + (terms of higher order) 

where the terms of higher order with nonzero coefficients are built by simple cycles 
having 1 -intersections. The coefficients of these complexes are given by formula 
1^. The coefficients of all other complexes are 0. 

There are two elementary operations which build up any complex. The first 
one A is symmetric difference and the way it works is to take the union of two 
complexes and subtract the 1-dimensional cells from the common intersection. 
The second one U is the usual union. We add subindex 1 to the operations if 
we want to clarify that the operation applies at one 1-dimensional cells or 2 if it 
applies at two 1-dimensional cells. 

Shape of the figures that build up the complex does not matter as soon as they 
are topologically the same. 
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Lemma 2. For any complex A and a triangle A*, 

ci(AAiA*) = ci(A). 

Proof. Notice that Ai does not change bi, i.e. 61 (A Ai A*) = 61(A) and let 
61(A) = n. Then using Theorem |2] we have 



J2 ci(A^) = J2 ci(^0- (8) 

A'eAAiA* A'eA 

The proof uses induction on 61. If 61(A) = 1, we aheady know that ci(A Ai A*) = 
ci(A). Suppose, it is true that ci(A Ai A*) = ci(A) for all A, s.t. 61(A) < n, 
then by equation (|8| and the fact that there is only one structure with 61 = n, the 
proof follows. 

D 

Theorem 4. For any structure A and a triangle A*, 

ci(AUiA*) = -ci(A). 

Proof. We use a technique similar to Mayer- Vietoris sequence. Decompose 
the complex A Ui A* as a union of A, A* and A', where the last one is the set 
of complexes which are build by elementary operations on elements from both A 
and A*. If we substitute the elements from A and A* in formula (p|, we get that 

^ci(w*A*) = (9) 

weA 

where * is any of the elementary operations. Notice that if a; G A and A* are 
1-nonintersecting, then Ci{u Ui A*) = ci{u Ai A*) = 0. The rest of the proof can 
be done by induction starting with 

ci(w Ui A*) = -ci(w Ai A*) = -ci(w) = -1, 

when a; is a fundamental cycle. Then using the induction hypothesis and formula 
Q, it follows that ci(A Ui A*) = -ci(A). 

D 

3 Term Structure of 6f 

Consider the following monomial 

Pi ■ ■ ■ PiPi + l ■ ■ ■ PjPj + l ■ ■ ■ PkPpiP2 ■ ■ ■ Pp2m-lP2mPp2m + lP2m + 2 ' ' ' Pp2n~lP2nPqiq2qS ' ' ' ^938-2933- l^Ss 

such that m is the minimum number of edges which is covered by s 2-faces, i is 
the minimum number of points covered by these m edges. Also, there are another 
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n — m edges which cover j — i points. Thus, once again we are forced to choose 
the first i together with the next j —i points and another m edges to exist. So, we 
can choose only whether k — j points, n — m edges and s faces exist or not. This 
time we are interested in summing those monomials with coefficients - the first 
betti number of the monomial. Then the following expression gives the coefficient 
of the above monomial 

k—j ,, .s n—m s 

E(-1)V7 E (-1)'^' E {-ir-{k-t-n+\v\+s-\u\-h-,-h-,) 

t=0 ^ ^ \u\=0 |cj|=0 

;^GC(p2m + lv.P2n) tJGC((?l ,...,(?3s ) 

where t is the number of non-existing points, |z/| is the number of non-existing 
edges and |a;| is the number of non-existing faces and in the parenthesis you can 
discover Euler - Poincare formula for bi. Notice that A; — t — n-|-|z/| — 6q does not 
depend on u as 6q does not depend on whether there is a face or not and instead 
we can write 6q. So we can spht the sum into three sums and evaluate each one 
of them 

-ES(-1)*(V)E"""^ H=o (-l)l'^l(r M=o {-irKk-t-n+\u\-b^,) 

+ E' M=o (-1)1-1(5 -|c^l) + E^ M=o (-1)1-1+16-) 

ujeC(qi,...,q;is) cj6C((?i,...,(?3s) 

Now, actually /c — t — n-|- |z/| — feg is exactly —b'( and let's remove the combinatorial 
term wherever possible 

-Et7(-l)lV) E"-"^ H=o (-l)l^l(-&rEM=o(-l)'^'(M) 

l^<^C{p2m+l,---,'P2n) 

+Eh=o(-i)'^'(^-i^i)(i:i)+e^ m=o (-i)i-i+^fe^) 

WGC(<Ji,...,<J3s) 

= -Eli(-i)*(V) E"""^ H=o i-^n-m - ly 

l^eC{p2m + l,---,P2n) 

+ (-1)^-^5 Em=o(-1)^-'-'^' (U) +r M=o (-1)1-1+16-) 

<^ec{(3i,...,(33s) 

= -Eli(-i)*(V) E""'" iH=o i-in-b^i - ly 

I^eC(p2m + l,---.P2n) 

+ (_l)-i,(i_i).-i + ^^ M=o (-1)1-1+^6^) 

wec(<ji,...,<j3s) 

Once again we can evaluate the first sum as b^ and 63 do not depend on the 
number of non-existing points, so we get k = j and so the coefficient is 

n—m s 



{-ip{-b'({i-iy+{-iy'h{i-iy-^+ J2 (-i)'-'^'&2)- 

|i^|=0 |aj|=0 

!^6C(p2m + lv,P2n) U}GC(qi,...,q3s) 

If s = 0, then the coefficient is given by the first summand only, as all others are 



E (-i)''^'^! 



W\=0 

V&C{p2m+l,---,P2n) 
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which is the coefficient c\. If s > 0, we get the following coefficient 

n—m s 

\v\=0 \u)\=0 

;^eC{p2m + lv.P2n) t^6C(<Jl ,...,<J33 ) 

We can evaluate the ffist sum and get that n = m and the coefficient becomes 

s 

|c^|=0 
w6C{i3i,...,i33s) 

When s = 1 the second sum is zero and the coefficient is -1. Similarly to the 
discussion above, define the coefficient C2 to be 

s 

C2(A) = Y. (-1)'^'^2 

\u\=Q 

ujeC{qi,...,q3s) 

if 62(A) is non-zero. 

In general, we can apply exactly the same procedure for calculating b^_^ and 
get that each k-cell comes with sign equal to (—1)'^"^ and for 6fc(A) 7^ we have 

s 

\uj\=0 
uJ€C{qi,...,qks) 

Similarly to the results in the lower case, we can prove 
Consequence 2. 

n 

bk{A) = j2ckm (10) 

and so there are no terms corresponding to m-spike and m-nonintersecting cycles, 
where m < k. 



Theorem 5. The polynomial b^ has the following structure 

where pf is the polynomial corresponding to the higher order terms of i-cell and 
dn+i is the polynomial corresponding to the (n+l)-cells. 
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4 Explicit Definitions 

Definition 5. Let the random variable C„ measures the number of n-cells in 
the complex X, i.e. C„ = ei + ■ ■ ■ + e„, Cj G {0,1} where each Cj is itself a 
random variable which corresponds to a real n-cell in the complex of all possible 
configurations of subcomplexes. Define the expected number of n-cells C^ to be 

where P{cj) = P{'Cn = Cj) and Cj is the number of n-cells. 

Similarly, one can define the rank of the expected n-cycles and n-boundaries. 

Definition 6. Let the random variable Z„ measures the number of n-cycles in the 
complex X , i.e. Z„ = /ci + ■ ■ ■ + km, h G {0, 1} where each ki is itself a random 
variable which corresponds to a real n-cycle in the complex. Define the expected 
number of n-cycles Z^ to be 

Z^ = Y,z,P{z^) = E{%^) 

where Cj is the number of n-cycles. 

Definition 7. Let the random variable ]B„ measures the number of n-boundaries 
in the complex X , i.e. B„ = /i + - ■ ■ + lp, k G {0, 1} where each k is itself a random 
variable which corresponds to a real n-boundary. Define the expected number of 
n-boundaries B^ to be 

where bj varies over the number of n-boundaries. 
From probability theory, we know that 

E{X + Y) = E{X) + E{Y) (11) 

thus it directly follows that 

/nE ^E I y?E 

Whatever H:^ = ^^ is supposed to be, define the n-th betti number of a 

random complex to be 

uE ^ 2:^ _ r^E 

n n n ' 

Definition 8. Define expected Euler characteristic to be 
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Applying equation (11) to the last definition and taking in mind the cancella- 
tions which occur by the minus sign in each consecutive summand in the above 
equation, we get 

Theorem 6. Expected Euler-Poincare Formula. 



x' 



Y.i-^yi'f- 



The last theorem can be easily verified by Theorems [3] and |5| 
From Theorem [6] it follows 

Theorem 7. 

X^(A UB) = x^(A) + x^(S) - x^(A n B) 

for random complexes A and B. 

In general, one can build a complete stochastic homology theory by redefining 
the classical homology axioms in stochastic homology sense. The classical homol- 
ogy theory sits in the stochastic and one can get most of the classical results valid 
for the stochastic homology as well. However, the most important result for cal- 
culation, e.g. Mayer- Vietoris exact sequence, does not hold. The reason for that 
is that the polynomial of higher terms p^ of union of complexes contains terms 
from both subcomplexes, and thus knowing the expected betti numbers of the two 
subcomplexes is simply not enough to generate the expected betti number of the 
union. 

5 Computation, Algorithm and Experimental Re- 
sults 

A quick look at Theorems [3] and [5] reveals that calculation of expected betti num- 
bers in general seems to be impossible for big set of points as the number of 
summands in h^ grows rapidly. However, one can notice that each polynomial 
representing h^ has several symmetries in it. It turns out that those polynomials 
are stabilized by a subgroup Sm ^^ Sfm\ , where m is the number of 0-dimensional 

cells of the complex. This is not surprising as any shift of two vertices generate a 
basis element for Sm- Such a polynomial is called to be almost symmetric and my 
hope was to try to represent it as product of linear terms over the field of complex 
numbers. Unfortunately, my attempts did not give a positive result. 

Thus, the only reasonable calculation of this polynomials can occur if we fix 
all probabilities of cells of dimension greater than 1 to be equal. Let's concentrate 
on the calculation of the polynomial pf , i.e. pi2 = . . . = P{m+i)m = x. For m = 4, 
we have 

pf = 2x^ - 6x^ + 3a;^ + Ax^ 
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and for m = 5 



Pf 



-6x^° + 40x^ - 105x^ + 130x^ - 60x^ - 18x^ + 15x^ + 10x\ 



In general, the polynomial p,f is of degree {Jl^) and this polynomial is easily 
computable at least for small n even when m is large. Thus one can calculate 
expected betti numbers for a Vietoris-Rips complex. 

The next question is how to assign probabilities to the cells when you have 
no statistics available for their distribution. One geometric way is to look at the 
distance between the center of mass of the cell and the closest point to it available 
from the data. Call this distance r^ and denote with r^ the distance between the 
center of mass and a vertex of the cell. There are many ways to assign probability 
using this data. I was looking at the circles centered at the center of mass with 
radii r^ and r^, thus the first probability which I constructed was 



p = 1 — 



vrr' 



1 



vrr 



It turns out that this probability shrinks the gaps between the points because the 
volume of a circle is contained at the exterior and so it was not what I was looking 
for. However, it gives a good idea for a better probability 

p = 1 - — 

\'ra 

which works well for random complexes. The whole class 

/ \ ^^^ 
p= 1- I — ) , V 



1- 






keN 



seems to be helpful for different types of data. 

Once the probabilities are assigned, we generate monomials corresponding to 
n-cells of different degrees. Each monomial has a unique coefficient c„ and is 
distributed in the random complex by the action of the stabilizer subgroup. Thus, 
it is only needed to calculate the order o„ of the orbit group, which can be done 
by combinatorial methods. The coefficients c„ can be either zero or non-zero. 

The first time I discovered the magnificent properties of these numbers, I spon- 
taneously called them "magical" coefficients. Though I could't prove anything 
more than the result cited above, the following experimental results reappear for 
different complexes. The first one is to decompose combinatorically a full cycle 
except one edge. The sum of all coefficients is zero. 
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Decomposition of cycle 

We decompose the cycle 1-3-6-2, but leave the edge 3-6 intact. The slide on 
the left shows in red which edges we decompose. The red edges in the right shdes 
denote missing edges. The coefficient is printed in the left upper corner. The 
slides with zero coefficients are not shown above. 

The next result which reappear again and again is if you decompose combina- 
torically the edges at a vertex. The sum of all such coefficients is also zero. 









Decomposition of edges at a vertex 

Here we decompose the edges at vertex 6 combinatorically. 

The reason to call these coefficients "magical" is that you can get even more 
interesting results. You can add more edges to the above decompositions and still 
get that the sum of the coefficients is zero. 














Decomposition of a cycle and an edge 

And we can continue adding edges to be decomposed, and each time the sum 
of the coefficients appear to be zero. 



24 

















Decomposition of cycle and two edges 

In fact, these coefficients are so designed that if we set all probabilities of 
cells equal to one, then no matter what complex we have the top coefficient, e.g. 
the one corresponding to the maximal monomial in this case, always wipes out 
all other coefficients and leaves the betti number only. This is the reason why 
Mayer- Vietoris holds in the classical case, but not in the random. 

Back to the discussion of the algorithm - the zero coefficients are associated 
with either a complex which has a spike, or a complex which is built up from two or 
more cycles which have zero n-intersection. The spikes can be detected by counting 
the degrees of the vertices. We say that a vertex has degree h iih cells cover that 
vertex. Thus there is a n-spike in the complex if there is a vertex with degree 
n. The n-intersection function is a little bit more complicated. We remove each 
n-cycle which has n-intersection with another cycle. At the end of the procedure, 
if there is a vertex of degree greater than n -|- 1 , then the complex does not satisfy 
the n-intersection test. If the monomial does not fail both tests above, i.e. c„ is 
nonzero, then it can be computed using formula (10) recursively. The cheapest 
way to do it is perhaps using binary tree with a union cell. However, I found it 
also efficient using binary tree structure with one coefficient cell. To each boolean 
word of zeros and ones as 101010110011, we can assign direction in the binary 
tree, left for and right for 1, and it is very easy to work with the coefficients 
this way. There are two more conditions which speed up the process by looking 
at the reduced form, which we get by erasing a vertex of degree n + 1 who has 
n-|-l neighbors of degree n -|- 1 and adding new n-cell between its n + 1 neighbors. 
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So if the reduced form is different from the initial form of the monomial, then we 
can simply copy the coefficient from the database. Finally, if the monomial has 
a vertex of degree n, then we can erase the n-cycle passing through that vertex, 
adjust the remaining degrees and copy the coefficient of the new monomial from 
the database with negative sign. Now we can write ki = J2 ^nCn and the formula 
for p^ can be written as 

Vn — 2^ *"^ ■ 

i 

Once we have the formula for p^ calculated once and forever, we can simply 
evaluate it using nested sequence of multiplications, that is 



Pyj — X ( . . . \\rh/m\X ~r K/ ( ra\ -i jX \ rh/'m\ o J*^ 



+ ... + A;3). 



for(i=Binom{m,n}; i>3; i- -) 
do 

monomial = GenerateNewMonomial(i); 
if(Spike(monomial)==false) 

if(NIntersection(ContractMonomial(monomial))==false) 
newMonomial = ReduceMonomial (monomial); 
oN = CalculateOrbitOrder (newMonomial); 
if (newMonomial != monomial) 

cN = ReadCoefficient (newMonomial); 
if(vertexOfDegreeNExists(monomial)==true) 

cN = - ReadCoefficient (EraseCycle (newMonomial)); 
else 

cN = CalculateCoefficientUsingFormula(monomial); 
kN+=oN*cN; 
while(monomial!=NULL); 
pN+=(pN+kN)*x; 
kN = 0; 
return pN*x*x*x; 



Algorithm for computing expected n-th betti number 

6 Applications. Coverage Problems and Large 
High-Dimensional Data 

There are perhaps thousands of problems where the model of stochastic homology 
can be applied. We will focus on two of them - coverage problems and high- 
dimensional data with noise. 

Coverage problem is set when a region is given together with subsets in the 
region and the question is whether these sets cover the entire region. Perhaps 
the cheapest way to solve such problem is using algebraic topology methods and 
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restate the question of whether the compactified union of subsets has the same 
homology as the compactified region. As an example, you can take the covering 
sets to be circles and associate the centers of the circles with cell phone towers, 
and the circles themselves with the regions where the signal is clear. 




Coverage Problem 

The zeroth betti number of the union of circles gives whether the region is 
connected, i.e. whether each cell tower can be connected to any other via a path. 
The first betti number gives the number of cycles which cannot be contracted, i.e. 
the holes in the coverage. Nowadays applied algebraic topology methods, such as 
persistence homology can answer such problems. 

In real life, however, thing are not that simple. Noise usually appears on the 
map of cell tower coverage. Bad weather, interception and component failure are 
the most common factors and the maps in real time looks more like an animation 
of appearing and disappearing contracting and extracting regions which might not 
even be circles. In this complicated formulation, we only have some statistics of the 
work times of the towers and of the quality of the signal between towers for some 
period of time. This is enough to construct random complex and apply stochastic 
homology methods in order to find some estimate of whether the network had 
good signal coverage over time. 

Another problem which can be associated with stochastic homology is the 
homology type of high-dimensional data. You can think of such data as the set 
formed by a large collection of pictures, where each picture is mapped to a point 
in a very high dimensional space M'^, d being the number of pixels in the picture. 
Other example is financial data - each point is associated with a vector with 
components the price of each product in the market. The evolution in time creates 
a path, which might be considered to lie on a manifold or a set with variance in 
the normal bundle. We can approximate the manifold locally with the best fit 
hyperplane, or the one which minimizes the square distances. In both case, usual 
persistent homology methods can be used, but the use of Voronoi cells, weak 
witnesses, etc. in order to create a complex restricts us to a certain level of 
certainty. The use of stochastic homology breaks these limits and allows us to 
use a level of uncertainty in our model. We can either guess that a point of the 
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manifold exists, even though there is no evidence in the data that it is true, or we 
can simply thin the manifold in the regions where it is not dense. 

References 

[1] Carlsson, Topology and Data, Bulletin of the American Mathematical Society 
46-2, 2009 

[2] de Silva, Christ, Coverage in Sensor Networks via Persistent Homology, Alge- 
braic & Geometric Topology 7, 2007 



28 



